Simple Randomized Mergesort on Parallel Disks1

نویسندگان

  • Rakesh Barve
  • Edward F. Grove
  • Jeffrey Scott Vitter
  • Rakesh D. Barve
چکیده

We consider the problem of sorting a file of N records on the D-disk model of parallel I/O in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D disks in parallel. We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forecast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks. SRM exhibits a limited use of randomization and also has a useful deterministic version. Generalizing the technique of forecasting, our algorithm is able to read in, at any time, the “right” block from any disk, and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the “right” blocks from memory to make space for new ones to be read in. The disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundamental inefficiencies of previous mergesort algorithms. By analysis of generalized maximum occupancy problems we are able to derive an analytical upper bound on SRM’s expected overhead valid for arbitrary inputs. The upper bound derived on expected I/O performance of SRM indicates that SRM is provably better than disk-striped mergesort (DSM) for realistic parameter values D, M , and B. Average-case simulations show further improvement on the analytical upper bound. Unlike previously proposed optimal sorting algorithms, SRM outperforms DSM even when the number D of parallel disks is small.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cole's Parametric Search Technique Made Practical

Parametric search has been widely used in geometric algorithms. Cole’s improvement provides a way of saving a logarithmic factor in the running time over what is achievable using the standard method. Unfortunately, this improvement comes at the expense of making an already complicated algorithm even more complex; hence, this technique has been mostly of theoretical interest. In this paper, we p...

متن کامل

Skeletons for Divide and Conquer Algorithms

Algorithmic skeletons intend to simplify parallel programming by providing recurring forms of program structure as predefined components. We present a fully distributed task parallel skeleton for a very general class of divide and conquer algorithms for MIMD machines with distributed memory. This approach is compared to a simple masterworker design. Based on experimental results for different e...

متن کامل

Parallel sorting on Intel Single-Chip Cloud computer

After multicore processors, many core architectures are becoming increasingly popular in research activities. The development of multicore and many core processors yields the opportunity to revisit classic sorting algorithms, which are important subroutines of many other algorithms. The Single-Chip Cloud Computer (SCC) is an experimental processor created by Intel Labs. It introduces 48 IA32 co...

متن کامل

Efficient On-Chip Pipelined Streaming Computations on Scalable Manycore Architectures

Performance of manycore processors is limited by programs’ use of off-chip main memory. Streaming computation organized in a pipeline limits accesses to main memory to tasks at boundaries of the pipeline to read or write to main memory. The Single Chip Cloud computer (SCC) offers 48 cores linked by a highspeed on-chip network, and allows the implementation of such on-chip pipelined technique. W...

متن کامل

Earliest - finish - time first algorithm

Dynamic programming can be very confusing until you’ve used it a bunch of times, so the best way to learn it is to simply do a whole bunch of examples. One way of viewing it is as a much more complicated version of divide-and-conquer a la mergesort or quicksort. In those cases, we could divide the problem into two subproblems, solve it optimally on each subproblem, and then combine the solution...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997